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Abstract 



^J In this paper, a distributed stochastic approximation algorithm is studied. Applications of such 

r algorithms include decentralized estimation, optimization, control or computing. The algorithm consists 

in two steps: a local step, where each node in a network updates a local estimate using a stochastic 

approximation algorithm with decreasing step size, and a gossip step, where a node computes a local 

(-J weighted average between its estimates and those of its neighbors. Convergence of the estimates toward 

Cd a consensus is established under weak assumptions. The approach relies on two main ingredients: the 

^ C^ existence of a Lyapunov function for the mean field in the agreement subspace, and a contraction 

property of the random matrices of weights in the subspace orthogonal to the agreement subspace. A 

k>" second order analysis of the algorithm is also performed under the form of a Central Limit Theorem. 

C^ The Polyak-averaged version of the algorithm is also considered. 



I. Introduction 

Stochastic approximation has been a very active research area for the last sixty years (see 
. !_H e.g. [1], [2]). The pattern for a stochastic approximation algorithm is provided by the recursion 

S-1 dn = On-i + '^nXn, whcrc ^„ is typically a M"'- valued sequence of parameters, F„ is a sequence 

of random observations, and 7„ is a deterministic sequence of step sizes. An archetypal example 
of such algorithms is provided by stochastic gradient algorithms. These are characterized by the 
fact that Yn = —Vg{On-i) + in where (? is a function to be minimized, and where {in)n>Q is a 
noise sequence corrupting the observations. 
In the traditional setting, sensing and processing capabilities needed for the implementation of 
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a stochastic approximation algorithm are centralized on one machine. Alternatively, distributed 
versions of these algorithms where the updates are done by a network of communicating nodes 
(or agents) have recently aroused a great deal of interest. Applications include decentralized 
estimation, control, optimization, and parallel computing. 

In this paper, we consider a network composed by N nodes (sensors, robots, computing 
units, ...). Node i generates a M'^-valued stochastic process {9n,i)n>i through a two-step iterative 
algorithm: a local and a so called gossip step. At time n: 
[Local step] Node i generates a temporary iterate ^„i given by 

where 7„ is a deterministic positive step size and where the M'^-valued random process (F„ j)„>i 
represents the observations made by agent i. 

[Gossip step] Node i is able to observe the values 9n,j of some other j's and computes 
the weighted average: 

N 
dn,i = ^Wn{i,j)OnJ , 

where the Wn{i,jys are scalar non-negative random coefficients such that '^j=iUin{i,j) = 1 






for any i. The sequence of random matrices Wn '■= [wn{i,j)]ij=i represents the time- varying 
communication network between the nodes. These matrices are called row-stochastic, since they 
have non negative elements and satisfy Wnl = 1 where 1 is the A^ x 1 vector whose components 
are all equal to one. 

This paper analyzes the convergence of this algorithm under some mild assumptions. In 
particular, due to the matrices Wn, the estimates will eventually reach the consensus in the 
sense that the differences 9n,i — Onj between the estimates of any two nodes i and j almost 
surely converge to zero as n — )■ oo. Asymptotic fluctuations of the estimates will also be studied 
through Central Limit Theorems. 

There is a rich literature on distributed estimation and optimization algorithms, see [3], [4], [5], 
[6], [7], [8] as a non exhaustive list. Among the first gossip algorithms are those considered in the 
treatise [9] and in [10]. The case where the gossip matrices are random and the observations are 
noiseless is considered in [11]. The authors of [7] solve a constrained optimization by also using 
noiseless estimates. The contributions [6] and [8] consider the framework of linear regression 
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models. In [12], stochastic gradient algorithms are considered in the case the matrices {Wn)n are 
doubly stochastic gossip i.e. Wnl = W^l = 1. This contribution assumes in addition that the 
gradients are bounded and considers rather stringent assumptions on the conditional variances 
of the observation noises. 

The contributions of this paper are summarized as follows: 

• The distributed stochastic approximation algorithm introduced above is studied under very 
general assumptions. In particular, the algorithm is not required to be of gradient type. 
Stability and convergence are established with the help of a Lyapunov function. It is shown 
that the sequences of estimates at all nodes converge unanimously to an equilibrium set of 
the noiseless recursion seen as a dynamical system. 

• The random gossip matrices Wn are assumed to be row stochastic and, column stochastic in 
the mean, i.e., Wnl = 1 and l^E[iy„] = 1^. Observe that the row stochasticity constraint 
Wnl = 1 is local, since it simply requires that each agent makes a weighted sum of 
the estimates of its neighbors with weights summing to one. Alternatively, the column 
stochasticity constraint l^Wn = 1^ which is assumed in many contributions (see e.g. 
[13], [7], [12], [14]) requires a coordination at the network level (nodes must coordinate 
their weights). This constraint is not satisfied by a large class of gossip algorithms. As an 



example, the well known broadcast gossip matrices [15] (see also Section pl-B] below) are 
only column stochastic in the mean. 

• The unanimous convergence of the estimates is also established in the case where the 
frequency of information exchange between the nodes converges to zero at some controlled 
rate. In practice, this means that matrices Wn become more and more likely to be equal 
to identity as n ^^ oo. The benefits of this possibility in terms of power devoted to 
communications are obvious. 

• Finally, we establish a Central Limit Theorem (CUT) on the estimates in the case where the 
Wn are doubly stochastic. We show in particular that the node estimates tend to fluctuate 
synchronously for large n, i.e., the disagreement between the nodes is negligible at the CUT 
scale. Interestingly, the distributed algorithm under study has the same asymptotic variance 
as its centralized analogue. 

• We also consider a CUT on the sequences averaged over time as introduced in [16]. We 
show that averaging always improves the rate of convergence and the asymptotic variance. 
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This paper is organized as follows. In Section |ll| we state and comment our basic assumptions. 



The algorithm convergence is studied in Section III The second order behavior of the algorithm is 



described in Section IV Section VI is devoted to the proofs. An application relative to distributed 
estimation is described in Section |V| along with some numerical simulations. The appendix 
contains some technical details. 

II. The model and the basic assumptions 

Let us start by writing the distributed algorithm described in the previous section in a more 
compact form. Define the M'^^-valued random vectors 6^ and Yn by 0^ := (6'^i, • • • ^(^nN^ 
and Yn '■= (Yni^ • • • 5 ^un)'^ where A^ denotes the transpose of the matrix A. The algorithm 
reduces to: 

en = {Wn0ld)ien-l+lnYn) , (2) 

where ® denotes the Kronecker product and I^ is the d x d identity matrix. 

Note that we always assume El^oP < oo throughout the paper, where | . | represents the 
Euclidean norm. 

Remark 1: Following [16], we also consider the averaged sequence (^n)n>i given by 

1 " 

On,i = — / , 6k,i (3) 

k=\ 



at any instant n for node i. We will show in Section IV-B that this averaging technique improves 
the convergence rate of the distributed stochastic approximation algorithm. Similarly, we note 
Qn '■= (^n i5 • • • 5 ^unY ■ ^^ '^hi^ paper, we analyze the asymptotic behavior of both sequences On 

and On as n — )■ oo. 

A. Observation and Network Models 

Let {fJ.e)Q^^dN be a family of probability measures on W^^ endowed with its Borel cr-field 
i3(M^^) such that for any A e B(R'^^), ^ fig{A) is measurable from B(R'^^) to B{[0,1]) 
where i3([0,l]) denotes the Borel cr-field on [0,1]. For any E M'^^, we denote by E^ the 
expectation with respect to (w.r.t.) the distribution fiQ. 

We consider the case when the random variables (r.v.) (Yn, Wn)n>i are defined on a filtered 
probability space (f], A, P, (J\i)n>o) and satisfy 
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Assumption 1: a) {Wn)n>i is a sequence of A^ x A^ random matrices with non-negative 
elements such that: 

• Wn is row stochastic: Wnl = 1, 

• E{Wn) is column stochastic: l^E(iy„) = l'^, 

b) For any positive measurable functions /, g and any n > 0, 

E[/(lV„+i)^(l^„+i)|J-„] =E[/(W^„+i)]E.J^(>^)] ■ (4) 

c) The sequence iWn)n>i is identically distributed and the spectral norm p of matrix E,{W'[{In — 
ll^/A^)l^i) satisfies p < 1. 

Assumptions 1^) and 1^) capture the properties of the gossiping scheme within the network. 



Following the work of [11], random gossip is assumed in this paper. Assumption 1^1 has been 



commented in the introduction. The assumption on the spectral norm in Assumption 1^1 is a 



connectivity condition of the underlying network graph which will be discussed in more details 



in Section pi-B[ Assumption [l[p|) implies that (i) the r.v. Wn and F„ are independent conditionally 
to the past, (ii) the r.v. iWn)n>i are independent and (Hi) the conditional distribution of Yn+i 
given the past is /i^^. 

It is also assumed that the step-size sequence (7„)n>i in the stochastic approximation scheme 
^ satisfies the following conditions which are rather usual in the framework of stochastic 
approximation algorithms [2]: 

Assumption 2: The deterministic sequence (7„)n>i is positive and such that lim„ jn/ln+i = 1, 
En7n = oo and^„72 <oo. 

B. Illustration: Some Examples of Gossip Schemes 

We describe two standard gossip schemes so called pairwise and broadcast schemes. The 
reader can refer to [17] for a more complete picture and for more general gossip strategies. The 
network of agents is represented as a non-directed graph (E,V) where E is the set of edges and 
V is the set of N vertices. 

1) Pairwise Gossip: This example can be found in [11] on average consensus (see also [18]). 

At time n, two connected nodes - say i and j - wake up, independently from the past. 
Nodes i and j compute the weighted average 6'„ j = 6nj = 0.59n,i + 0.56'„j; and for k ^ {i,j}, 
the nodes do not gossip: 6n,k = dn,k- In this example, given the edge {«, j} wakes up, Wn is 
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equal to In — (e^ — ej){ei — Cj) /2 where Cj denotes the ith vector of the canonical basis in 
M^; and the matrices (Wn)n>o are i.i.d. and doubly stochastic. Assumption 111 is obviously 



satisfied. Conditions for Assumption 1^1 can be found in [11]: the spectral norm p of the matrix 
E{Wn{lN - 11^ /N)W^) is in [0, 1) if and only if the weighted graph (E, V, W) is connected, 
where the wedge {i,j} is weighted by the probability that the nodes i,j communicate. 

2) Broadcast Gossip: This example is adapted from the broadcast scheme in [15]. At time n, 
a node i wakes up at random with uniform probability and broadcasts its temporary update 9n,i 
to all its neighbors A/^. Any neighbor j computes the weighted average 9nj = /39n,i + {l — /3)9n,j- 
On the other hand, the nodes k which do not belong to the neighborhood of i (including i itself) 
sets 9n^k = 9n,k- Note that, as opposed to the pairwise scheme, the transmitter node i does not 
expect any feedback from its neighbors. Then, given i wakes up, the {k,i)th component of Wn 

is given by: 

' 1 if k ^Afi and k = i , 

(3 if A; G M and £ = i , 

1-/3 if A; G M and A; = £ , 
otherwise. 

This matrix Wn is not doubly stochastic but l^E(iy„) = 1^ (see for instance [15]). Thus, the 
matrices {Wn)n>i are i.i.d. and satisfy the assumption lEl. Here again, it can be shown that the 
spectral norm p of ¥.{Wn{lN - ll^/^)W^rf) is in [0, 1) if and only if (E, V) is a connected 
graph (see [15]). 



Wnik,i) 



III. Convergence results 

In this section, we address the asymptotic behavior when n — )■ oo of the algorithm (|2]) and of 
its averaged version ([3]). We prove in Theorem [T] that all agents eventually reach an agreement 
on the value of their estimate: the limit points of {6n)n>i (resp. (^n)n>i) given by ^ (resp. 
^) are of the form 10 9^. 



A. Notations 

Denote by |x| the Euclidean norm of a vector x and by V the gradient operator (on M.'^). Let 

J:=(llViV)®/, , J±:=IdN-J, (5) 
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be resp. the projector onto the consensus subspace \l ®0 : 6 eW^^ and the projector onto the 
orthogonal subspace. For any vector x E W'-^ , define the vector of M'^ 

(a^):=^(l^®/d)x, (6) 

so that Jx = 1 (g) (a;). Note that (a;) = {xi + ■ — h xn)/N in case we write x = (a;f , . . . , xjf)'^, 
Xi in M'^. Set 

x± := J±x (7) 

so that X = 1 ® (x) + x±. We will refer to 0^ „ := J±On as the disagreement vector. 

B. Assumptions on the distributions fie 

In order to derive the convergence results, assumptions on the probability measures {f^e)e£R'i^ 
have to be introduced. Define the function /i : M"' — )■ M'^ by: 

h{d) := Ei^e [{Y)] . (8) 

We shall refer to h as the mean field. The key ingredient to prove the convergence of a stochastic 
approximation procedure is the existence of a Lyapunov function V for the mean field h i.e., a 
function V^ : M*^ ^ M+ such that VV^ h<0. 

It is assumed: 

Assumption 3: There exists a function V : M'^ — t- IR+ such that: 

a) V is differentiable and W is a Lipschitz function. 

b) For any 6 G M"^, VV{efh{e) < 0, where h is given by g. 

c) There exists a constant Ci, such that for any 6 G M"', \W{e)\^ < Ci(l + V{e)). 

d) For any M > 0, the level set {9 eR'^ : V{9) < M} is compact. 

e) The set C:= {9 eR'^ : W{9)'^h{9) = 0} is non-empty and bounded. 

f) V{C) has an empty interior. 

Assumption |^ implies that the Lyapunov function V increases at most at a quadratic rate when 
\9\ —7- oo. Assumption 2|l is trivially satisfied when C is finite. 



When h is a gradient field i.e. h = —Vg, a natural candidate for the Lyapunov function is 
V = g.ln this case, C = {Vg = 0}; when g is rf-times differentiable, the Sard's theorem implies 
that g{{Vg = 0}) has an empty interior. If g is strictly convex with optimum 9^, the function 
9 t-^ \9 — 9^,]"^ is also a Lyapunov function. In this case, C = {6**}. 
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Assumption 4: a) There exists a constant C2 such that for any G R.'^^, 

^e[\Yf] <C2{l + Vm) + \0±n , (9) 

\Ee{Y) -E^^i^g^{Y)\ <C2\e^\ . (10) 

b) The function h is continuous on M*^. 

Condition ^ implies that \h{9)\'^ < C2(l + V{9)) (set 6 = 1 ^ 9 and use Jensen's inequality). 

Combined with assumption |3| this means that h(9) is at most linearly increasing when \9\ — )• cxd. 

C. Almost sure convergence of the distributed algorithm 

Define d{9, A) := inf{|e - ^| : ^ e A} for any 9 eR'^ and Ac R'^. 
Theorem 1: Under Assumptions [1} |2[ |3] and |4} w.p.l, 

\imd{{en),C) = , lim6lx,„ = 0, (11) 

n—>oo n 

where C is given by Assumption |3J Moreover, w.p.l, ((0n))„>i converges to a connected 
component of C. 

Theorem [T] states that, almost surely, the vector of iterates On given by ^ converges to the 
consensus space as n — ;• 00 so that the network asymptotically achieves consensus. 

The assumptions of Theorem [T| imply that w.p.l, the sequence {V^((^n))}n>o converges to a 
(random) point v^, E V{C). This can be used to show that {{On))n>o converges to a connected 
component of {^ G £ : V{9) = v^,}. In general, this does not imply that ((0n))„>o converges 
w.p.l to some (random point) 9^ G C Note nevertheless that this holds true w.p.l when C is 
finite. 

Along any sequence (^„)„>o converging to 1 ® ^* for some 9^ G C, the Cesaro's lemma 
implies that the averaged sequence {6n)n>o converges w.p.l to 1 (g) 6'*. Therefore, the averaged 
sequence Q and the original sequence ([2j) have the same limiting value, if any. 

D. Case of a vanishing communication rate 

Theorem [T] still holds true when the r.v. {Wn)n>i are not identically distributed. An interesting 
example is when Wn is the identity matrix with a probability that tends to one as n — )• 00. From 
a communication point of view, this means that the exchange of information between agents 
becomes rare as n — )■ 00. This context is especially interesting in case of wireless networks, 
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where it is often required to limit as much as possible the amount of communication between 
the nodes. 

In such cases, Assumption]^ does no longer hold true. We prove a convergence result for the 
algorithms ([2]) and ([3]) when the spectral norm p„ of the matrix Wn and the step size sequence 
(7n)n>i satisfy the following assumption: 

Assumption 5: X]ra7™ = ^^ ^^'^ there exists a > 1/2 such that: 

lim n°jn = , lim n^+°7„ = +oo , (12) 

n—^oo n— >oo 

liminf^^^>0, (13) 

where p„ is the spectral norm of the matrix K{W^{In — ll'^ /N)Wn)- 

Note that under Assumption|5} lim„ ra(l— p„) = +oo. A typical framework where this assumption 
is useful is the following. Let (-B„)n be a Bernoulli sequence of independent r.v. with P(-B„ = 
1) = pn and liminf„p„/(n"7„) = +oo: replace the matrices Wn described by Assumption [l] 
with BnWn + (1 — Bn)lN- Hcrc Pn rcprcscnts the probability that a communication between the 
nodes takes place at time n. 

We also have Yln'^n < oo so that the step-size sequence (7n)n>i satisfies the standard 
conditions for stochastic approximation scheme to converge. 

An example of sequences (7n)n>i, (Pn)n>i satisfying Assumption IS] is given by 1 — p„ = a/rf 
and 7„ = 70/^^ with r],^ such that < r/ < ^ — 1/2 < 1/2. In particular, ^ G (1/2, 1] and 
r^G [0,1/2). 



When the r.v. iyVn)n>i are i.i.d., the spectral norm p„ is equal to p for any n, and ( 13 1 implies 
p < 1: one is back to Assumption [ip. From this point of view. Assumption |5] is weaker than 
Assumption 1^1. Nevertheless, stronger constraints than Assumption [^ are needed on the step 



size (7n)„>i. 

When substituting Assumption [^ by Assumption |5} we have 

Theorem 2: The statement of Theorem [T] remains valid under Assumptions 1 a b]), |3} |4] and |5] 
Theorem [2] is proved in Section VI-C 



IV. Convergence rates 
In this section, we derive the convergence rate in L^ of the disagreement sequence {6 



A-,n)n 



defined O^^n '■= J±Gn (see dSl) and (|7])). We also derive Central Limit Theorems for the 
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sequences (0„)„ and {6n)n- we show that averaging always improves the convergence rate and 
the asymptotic variance. 

A. Convergence rate of the disagreement vector 0±^n 

Whereas Theorem [T] states that 0^ „ — > almost surely, Theorem [3] provides an information 
on the convergence rate: 0^ „ tends to zero in L^ at rate l/jn- 
Theorem 3: Under Assumptions [1} |2[ |3] and |4} 

7.-^E (|0^,„n < ^^4^ + O [p-'^rn') (14) 



where p is given by Assumption 1 ;) and C := limsup„_^o^E(|YL,nn is finite. 



B. Central Limit Theorems 

We derive Central Limit Theorems for sequences {6n)n and {6n)n converging to a point l®6i, 
for some 6^, E C To that goal, we restrict our attention to the case when the matrix {Wn)n are 
doubly stochastic i.e. l^Wn = 1^. The general case is far more technical and out of the scope 
of this paper. We also assume that the point 6'^ and the r.v. Y satisfy 

Assumption 6: a) 9^ G C 

b) The mean field /i : M'^ — )• M'^ given by (JSJ) is twice continuously differentiable in a neighbor- 
hood of 6*^. 

c) Vh{9^,) is a Hurwitz matrix i.e. the largest real part of its eigenvalues is —L for some L > 0. 
Assumption 7: a) There exist 6 > and r > such that sup^g_-i_^g^^^^E,0 [|(Y)p+^] < oo. 

b) The function i— > Eg [(Y)(l^)^] is continuous in a neighborhood of 1 (g) 9^,. 
We finally strengthen the assumptions on the step-size sequence (7n)n>o and assume that 
Assumption 8: a) (7„)„ is a positive deterministic sequence such that either log(7jfc/7jfc+i) = 
o(7fc), or log(7fc/7fe+i) ~ 7^/7* for some 7* > 1/(2L). 

b) E„7n = 00 and ^^7^ < 00. 

c) lini„n7„ = +00 and 



lim 



-. n -.11 



fc=l 



7fc+i 





n 
fc=i 



The step size 7„ ~ 7*/n^ satisfies Assumptions S i bj) for any 1/2 < ^ < 1 since log(7fc/7fc+i) ~ 
^/k. Similarly, if 7„ ~ 7^,/^, Assumption ^il holds provided that 7^ > (1/2L). Observe that 
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when the sequence (7„)n is ultimately non-increasing, then the condition lim„ wyn = +00 implies 
lim„ ^/n'^ Yl=ilk^''^ |1 - {lkhk+i)\ = (see e.g. [19, Theorem 26, Chapter 4]). 
Set 

T := Ei^,, [{Y){Yf] -Ei^e. [{Y)]Er^g^ [{Y)f . 

Theorem 4: Let Assumptions [l| |3} |4| |6[ [t], |^§jb]) hold true. Assume in addition that l^Wn = 
1^ w.p.l. Then under the conditional probability F{-\\i'mk0k = 1 (X) 9^,), the sequence of r.v. 

1/0 

(7n {On — 1 ® 0^))n>o convcrgcs in distribution to 1 Z where Z is a centered Gaussian 
distribution with covariance matrix S solution of the Lyapunov equation: 



k 



v/i(^.)s + j:vh{e,f = -T if iog(7fc/7fc+i) = 0(7 

^ (/ + 27,V/i(^.)) S + S (/ + 2'y,Vh{e^f) = -T if log(7fc/7fc+i) ~ 7^/7, 
The proof of Theorem |4] is postponed to Section |VI-E 



The asymptotic variance can be compared to the asymptotic variance in a centralized algorithm: 
formally, such an algorithm is obtained by setting Wn = 11^ /N®Id- Interestingly, the distributed 
algorithm under study has the same asymptotic variance as its centralized analogue. 

Theorem |4] shows that when 7„ ~ 7*/n° for some a E (1/2, 1], then the rate in the CUT is 
(9(l/n"/^). Therefore, the maximal rate of convergence is achieved with 7„ ~ 7*/^ and in this 
case, the rate is 0{l/y/n). Unfortunately, the use of such a rate necessitates to choose 7^ as a 
function of Vh{6i,) (through the upper bound L, see Assumption ^il), and in practice Vh{6i,) 



is unknown. We will show in Theorem [s] that the optimal rate 0{l/ ^/n) can be reached by 
applying the averaged procedure dsl) with 7„ ~ 7*/ri" whatever a E (1/2, 1). 

A second question is the scaling of the observations in the local step. Observe that during 
each local step of the algorithm (see ([1])), each agent can use a common invertible matrix gain 
r and update the temporary iterate ^„ ^ as 

9n,i = 9n-l,i + 7n. rF„,i . (15) 

It is readily seen that the new mean field h : 9 \-^ E,if^g[{{T (g) In)Y)] is equal to Th and 
Assumptions Island [4]remain valid with {Y, h, V) replaced by {{T'^In)Y, Th, T'-^V). Therefore, 
introducing a gain matrix T does not change the limiting points of the algorithm (|2]) (and thus 
Q) but changes the asymptotic variance. In the case of the optimal rate in Theorem |4] (i.e. the 
case 7„ ~ 7^/n for some 7^, > 1/(2L)), it can be proved following the same lines as in [20] 
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(see also [1, Proposition 4, Chapter 3, Part I]), that the optimal choice of the gain matrix is 
Vi, = — 7~^V/i(6'*)^^ By optimal, we mean that, when weighting the observations by F^ as 
in (15), the asymptotic covariance matrix S^ obtained through Theorem |4] is smaller than the 



limiting covariance Sp associated with any other gain matrix T i.e., Sp — S^ is nonnegative. 
Moreover, S^ is equal to: 

7,-1 vh{e,)-'TVh{e.)-^ . 

Otherwise stated, {y/n {{On) — 6'^))„>o converges to a centered Gaussian vector with covariance 
matrix Vh{e^)-^TVh{e^)-^. 

In practice, Vh{6i,) is unknown and such a choice of gain matrix cannot be plugged in 
the algorithm (|2]). Fortunately, Theorem [5] shows that this optimal variance can be reached by 
averaging the sequence {6n)n- 

Note that these two major features of averaging algorithms for stochastic approximation 
(optimal convergence rate and optimal limiting covariance matrix) has been pointed out by 
[16] (see also [21]) in case of centralized algorithms. 

Theorem 5: Let (7„)„ be a deterministic positive sequence such that log{'jk/lk+i) = 0(7^). 
Let Assumptions [l| [3| [i} [ej [TJ ^j^j^ hold true. Assume in addition that 1^PF„ = 1^ w.p.l. Then 
under the conditional probability P(-| lim^ Ok = 106*^), the sequence of r.v. {y/n (^„ — l(8)6'^))„>o 
converges to 1 Z where Z is a centered Gaussian distribution with covariance matrix 

vh{e.)-' Tv/i(^.)-^ . 



The proof of Theorem is postponed to Section VLF 



V. An Application Framework 

A. Distributed estimation 

To illustrate the results, we describe in this section a distributed parameter estimation algorithm 
which converges to a limit point of the centralized Maximum Likelihood (ML) estimator. Assume 
that node i receives at time n the M™' -valued component X„ j of the i.i.d. random process 
Xn = {X'^i, ■ ■ ■ X'^n)'^ ^ ]R^™% where Xi has the unknown density f^{x) with respect to the 
Lebesgue measure. The system designer considers that the density of Xi belongs to a family 
{f{9, x)}0(^j^d. When f{9, x) satisfies some regularity and smoothness conditions, the limit points 
of the sequences On that maximize the log-likelihood function Ln{9) = ^^^^ log/(6', X^) 
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are minimizers of the KuUback-Leibler divergence -D(/* || f{9, ■)) [22]. Our aim is to design 
a distributed and iterative algorithm that exhibits the same asymptotic behavior in the case 
where f{9,x) is of the form f{9,x) = Y[i=i fii^^^^i) where x = {xj , . . . , xjj-)'^ is parti- 
tioned similarly to Xi. To that purpose, Algorithm (|2]) is implemented with the increments 
^n+i,i = "^e^og fi (9 n,i,Xn+i,i) whcrc Ve is the gradient with respect to 9. In some sense, 
log fi{9nA, Xn+i,i) is a local log-likelihood function that is updated by node i at time n + 1 by 
a gradient approach. Writing = (9j, . . . , 9]^)'^, the distribution ^g introduced in Section II-A 
is defined by the identity 

Eg[g{Y)] = j g{{Velogh{9,,xif,...,VelogM9N,XNff) Ux) dx 

for every measurable function g : M^*^ — !■ IR+. The associated mean field given by Equation ([8]) 
will be 

h{9) = ^ jVe\ogf{9,x)Ux)dx. 

Since h{9) = —N^^VeD{f^ \\ f(9, •)) (assuming Vg and J can be interchanged), our algorithm 
is of a gradient type with V{9) = D{f^ \\ f{9, ■)) as the natural Lyapunov function. Under 
the assumptions of Theorem [T] or Theorem [2| we know that the 9n,i, i = 1, . . . , N converge 
unanimously to £ = {^ : W{9) = 0}. Here, we note that under some weak extra assumptions 
on the "noise" of the algorithm, it is possible to show that unstable points such as local maxima 
or saddle points of V{9) are avoided (see for instance [23], [24], [25]). Consequently, the first 
order behavior of the distributed algorithm is identical to that of the centralized ML algorithm. 
We now consider the second order behavior of these algorithms, restricting ourselves to the case 
where f*{x) = Y[i=i fi{9i,,Xi) for some 9^, E W'-. With some conditions on /,,, it is well known 
that any consistent sequence 9n of estimates provided by the centralized ML algorithm satisfies 
^/n{9n — 9i,) — > A/'(0, F{9i,Y^) where — > stands for the convergence in distribution, A/'(0, S) 
represents the centered Gaussian distribution with covariance S and 

TV 

F(^-) = J2 '^elogM9,,Xi)Ve\ogfi{9,,x,)^M9,,x,)dx, 

is the Fisher information matrix of /(6'^, ■) [22, Chap. 6]. We now turn to the distributed algorithm 
and to that end, we apply Theorems |4] and |5| Matrices Wh{9^) and T found in the statements 
of these theorems coincide in our case with —N~^F{9i,) and N^'^F{9i,) respectively (same 
value of T for both theorems). Starting with the averaged case. Theorem [5] shows that on the 
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set {lim„ On = 1 ® 9^,}, the averaged sequence 0„ satisfies y/n{6n — 1 ® 6'^,) — )■ 1 ® Z 
wliere Z ~ J\f(0, F(9^,)^^). Tliis implies that the averaged algorithm is asymptotically efficient, 
similarly to the centralized ML algorithm. Let us consider the non averaged algorithm. In order 
to make a fair comparison with the centralized ML algorithm, we restrict the use of Theorem 
|4] to the case where 7„ has the form 7„ = 7^/n. In that case. Assumption |8] is verified when 
7^, > A^/(2Amm(-F(^*))) where XmmiF{6^)) is the smallest eigenvalue of F{9^). Theorem |4] 
shows that on the set {lim„ 0„ = 1 (g) ^^}, the sequence of estimates On satisfies y/n(6n — 
1 ® Oi,) — > 1® Z where Z ~ A/'(0,S), and where S is the solution of the matrix equation 
i:{2N-^-i^F{e^)-h) + {2N-^-i^F{e^)~h)T. = 2-ilN''^F{e^). Solving this equation, we obtain 
S = ^lN-^F{e,){2^,N-^F{e,) - h)-\ Notice that S - F{e,Y^ = F{e,)-\2^,N-^F{e,) - 
Id)^^il*N^^F(9i,) — Id)"^ > 0, which quantifies the departure from asymptotic efficiency of the 
non averaged algorithm. 

B. Application to source localization 

The distributed algorithm described above is used here to localize a source by a collection of 
A^ = 40 sensors. The unknown location of the source in the plane is represented by a parameter 
6*^ G M^. The sensors are located in the square [0, 50] x [0, 50] as shown by Figure [T| and 
they receive scalar-valued signals from the source (m-j = 1 for all i). It is assumed that the 
density of Xi G M^ is f,{x) = Uti M^*^ ^i) where fi{e,,-) = Af{mO/\e, - n\^,10-^) 
where Tj G M? is the location of Node i. The fitted model is f{9,x) = Y[i=i fii^^^i) ^^^^ 
f.(e, ■) = 7V(1000/|^ - np, 10^2) (see [25] for a similar model). The model for matrices Wn is 



the pairwise gossip model described in Section II-B The step sequence 7^ is set to ci/rr for 



n < 10000 iterations, C2{\ogn/nf-^ for 10000 <n< 20000 and c^{\ogn/nf-^ for n > 20000 
with ci < C2 < C3. Finally, the initial value Oq G M^^ is chosen at random under the uniform 
distribution on the square [0,50] x [0,50]. 

The convergence of the distributed algorithm to the consensus subspace is illustrated in 
Figure |2] Four paths (starting from the same value Oq) are run and we display n \—^ {l/N)\6n — 
l®9i,\ for n < 50000. Note the role of the step size sequence in the rate of convergence (compare 
the definition of 7„ above and the changes in the slopes at time n = 10000 and n = 20000). 
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VI. Proofs 

A. Notations 

For a positive deterministic sequence (an)n>i, o(a„) stands for a deterministic M^-valued 
sequence (a;„)„>i such that lim„^oo ctn^la^nl = 0. For p > 0, we denote the L^'^-norm of a 
random vector X by \\X\\p := E(|X|*')^/p. OLpio-n) stands for any M^-valued r.v. (X„)„>i 
such that Um„_>oo ctn^ll-^nllp = 0^ Oipian) stands for any M^-valued r.v. (X„)„>i such that 
Umsup„a~^||X„||p < oo; and O^.p.i.('^n) stands for any M^-valued r.v. (X„)„>i such that 
hmsup„ a~^|X„| is finite almost-surely. 

We start with a preliminary lemma which will be crucial for most of the proofs. 

B. Preliminary result 

Lemma 1 (Agreement): Under Assumptions [1^-b), [2| [^^Hc]), ^i) and|5} 



a) X]n>l^ l^-L,nl < °*^ ^^^ iG±,n)n>l COUVCrgCS tO ZCrO W.p.l. 

b) sup„>iEy((6l„))<oo, 

where (x) and x± are given by ^ and ^. 

Proof: Define m„ := E [|0±,np] and Vn '■= E [V{{0n))] ■ We prove that there exist a constant 
M > and an integer uq such that for any n > uq: 

Un < PnUn-1 + InM y/u^il + Un-^1 + Vn^lY^^ + ^lM (l + Un-l+Vn-l) , (16) 

Vn < Vn~l + MUn-1 + 7„M^M„_i (1 + U^-l + Vn-l)^^^ + 7n^(l + M„_l + t;„,_l) • (17) 

The proof is then concluded by application of Lemma |3] upon noting that under assumption |2} 
the rate 0„ = n^" satisfies the conditions (29) and ( |30l ). 



Proof of (16). As Wnl = 1, we have J±{Wn ® Id) = J±{Wn ® Id)J±- As a consequence, 



^±,n = Jii^n® Id) (G ±,n-i + InY „) . Wc cxpaud the square Euclidean norm of the latter vector: 

\e±,n\^ = (6>^,„_i + 7„l^„)^({iyJ(/;v - 11^ /N)Wn} ® Id){0±,n-1 + InY n) • 

Integrate both sides of the above equation w.r.t. the r.v. Wn', by assumption 1^1 
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Under Assumption |5} lim„n(l — p„) = +00: then, there exists uq such that p„ < 1 for any 
n > uq. We obtain: 

for any n > uq. From Cauchy-Schwartz inequality, ]E[|0_l,„-i| |^n.|] < ^'U„_i(E[|l^„p])-^/^. 
Thus, 

Un < PnUn-1 + 2^nV^I^li^[\Yn\^]y^^ + ll^Y n\^] • 

By assumption ^Jl, we have the following estimate E[|l^„p] < C2 (1 + f„_i + m„_i)- This 



completes the proof of ( 16), for any constant M larger than 1 + C2. 



Proof of (17). We use the following Taylor- Lagrange expansion of the Lyapunov function V 
at point {dn). There exists 4 e M'' such that |^„ - {6n-i)\ < |(^„) - {6n-i)\ and 



Under Assumption 2 i), VFis a Lip schitz function. Thus, \'W{6„)—'W{{6n-i))\ < KupKOn) — 
{On-i)\, where K^p denotes the Lipschitz constant. Therefore, 



V{{dn)) < Vi{0n-l))+VV{{0n^i)fi{On) - {0n-l)) + KL^p\ {On) " (e/n-l)!' 

We need to evaluate the difference (0„) — (^n-i)- By ([2]), 



(18) 



(^n) 



N 



Id)ien-l+lnYr 



Therefore, 



{On) - {On-l) 



-®Id] 0\n-l 



-<^ 

N 



Id InYr. 



^ d I in ^ n 1 



(19) 



^ 7 ^'" V N 

where the second equality is due to the fact that Wn is row-stochastic. Under Assumption llfi]), 
E(H/„) is doubly stochastic. Thus, using the assumption 1^): 



E[(6I„) - (6/„_i)|J-„_i] = 7„Ee„_,(l"„) . 



(20) 



Plugging (20) into (18), 



ny{{Gn))\J'n-l] < V{{en-l)) +lrNV{{en-l)Y^e„_AYn) + KL^pn\'^On) ' (0n-l)|Vn-l] • 
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By the condition 2£), the quantity — Vy((0„_i))^/i((0„_i)) is positive; therefore, 



Using successively the conditions ^i) and ^1, we have the estimate 



Using Cauchy-Schwartz inequality, the expectation of the above quantity is no larger than 



^C2\/un-i{l + Vn-i). We obtain: 



Vn < ^^n-1 + 7nV ClC2VMn-l(l + ^n-l + ^^n-l) + -ft'LJpIE[| (^n) " {On- 



(21) 



where we used the fact that m„_i > 0. We now need to find an estimate for E[|(0„) — (0„_i)p]. 



Using Minkowski's inequality on ( [19] ), 

E||{e„)-(e,._i)|Y''<E 



'^'''"-'\i.,^e 



N 



±,n-l 



1/2 



+E 



1^ 

N 



(g) Id InYn 



-1 1/2 



(22) 



Focus on the first term of the RHS of the above inequality. Remark that 

E[{w^i - i)(i^iy„ - 1^)1 J-„_i] = E[iyjii^iy„] - 11^ , 

where we used the assumption 1^1 along with the fact that E(iy„) is doubly stochastic (see 



the condition Ijl). Upon noting that the entries of Wn are in [0,1] (as a consequence of 
assumption 1 il), the spectral norm of E[iyJll^PV„] — 11^ is bounded. Thus, there exists a 



constant C such that: 



E 



N 



lAe 



±,n-l 



< C'Un-1 



By similar arguments, there exists a constant C" such that 



E 






Id InYn 



< C"7'E|r„|2 

< C2C"^l (l + Un-l+Vn-l) 
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where we used assumption ^il. Putting this together with (22), 



< C(m„_i + 7^ (1 + M„_i + Vn^l) + 7nA/«n-l(l + U^-l + t^n-l)) 



where C > is some constant chosen large enough. Plugging the above inequality into (21 1, 



Vn < f^n-l + {KLipC)Un-l + (V C1C2 + -K'LipC)7„Vu„_i (1 + U^-l + t^n-l) 

+ KupCjl (1 + M„_i + Vn- 



This proves that ( frTj ) holds for any M chosen large enough. 

Corollary 1 (of Lemma \lj: Under the assumptions of Lemma [11 sup„E[|Y„p] < oo. 
Proof: By Assumptions [l[]5] ) and [^ : 



E[|l^„|'] =E[Ee,,_,[\Yf]] <C2(l + E[V((6/„_i))]+E[|0±,n-i|']) • (23) 

The proof is concluded by Lemma [T| ■ 

C. Proof of Theorems [7] and |2] 

We give the proof of Theorem |2| the proof of Theorem [1] is on the same lines and details are 
omitted. By LemmafTl (^±,n)n>i converges to zero w.p.l and in L^ . Therefore, the study of the 
whole vector On is reduced to the analysis of its projection JOn = 1 ® {On) onto the consensus 
space. We now focus on the average {On)- The convergence of the sequence {{On))n>i is a direct 
consequence of Lemma [2] along with [27, Theorems 2.2. and 2.3.]. 

Lemma 2: Under Assumptions [1^-b), |2} |^§jc]), ^l and |5} it holds: 



{On) = {On-l) + lnh{{On^l)) + 7nCn 

with sup„ I X]fc=i lkC,k\ < oo almost- surely. 

Proof: Eqs. ([2]) and ([6]) along with assumption [^ yield: 

{On) = {On-l) + -fn{Zn) , whcrC Z„ := {Wn ® Id)(Yn + 7n '^±,n-l) (24) 



upon noting that under Assumption 1 il, {Wn^Id)J = J- We write {Zn) = h{{On-i)) + en + ^n 
where 

en := {{Wn^Id){Yn + %'0^,,_,))-Eg^_,[{Y)] 

^n := Ee„_A{Y)]-E,^^e._,^[{Y)]. 
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By Assumption ^Jl and the inequality 2ab < a^ + 6^, there exists a constant C such that 



E 



X] ^""^^ 



n>l 



\n>l ra>l / 



(25) 



Therefore, the RHS in (25) is finite under the condition |2] and Lemma [T| thus implying that 
E„>i7n^n converges w.p.l. 

Since E [e„ |J-'n-i] = 0, the sequence {Sn '■= X]fc=i7fc^fc)n>i ^^ ^ martingale. We prove that it 
converges almost surely by estimating its second order moment. For any A; > 1, 



n>l 



< 5^7^E [(r„ + 7~'^±,n-l)^Pn(l"„ + 7-'0±,n- 



n>l 



where we set P„ := N~'^W^ll'^Wn ® Z^. Note that P„ is independent of F„ conditionally to 
J-'n-i- Since W„ is a stochastic matrix, its spectral norm is bounded uniformly in n. Therefore, 
there exists a constant C > such that: 



n>l 



n>l 



n>l 



By Lemma [11 Corollary [T| and Assumption [2] it follows that sup„ E [|S'„p] is finite thus implying 
that the martingale (S'„)„>i converges almost surely to a r.v. which is finite w.p.l. (see e.g. [28, 
Corollary 2.2.]). This concludes the proof. 



D. Proof of Theorem |i] 

Set Vn ■={In- ll^/N)Wn and for any 1 < A; < n, 

^n,k ■■= {Vn ® Id){Vn-l ® /rf) " " " {Vk ® Id) • 

Note that by Assumptions 1§ cj. 



(26) 



|$n,fcX| 



E[X^ ^i_, ,{V^ Vn ® h)^n-l,kX] = E[X' $^_i ,E{V: Vn ® Id)^n-l,kX] 



< pE[X'^i_,,^n-i,kX] =p||<l>„_i,,X| 



(27) 



From ^ and since J±{Wn ® Id) = J±{Wn ® Id)J± = (Vn Id)J± by Assumption 1 il, it holds 
for any n > 1, 0±^n = (K ® Id)iO±,n~i + lnY±,n)- By induction. 



0±,n = '^lk'^n,kXL,k + $n,l6'±,0 



(28) 



fc=i 
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where $„fc is defined by (26). By ( [27] ) and Assumption 1 ;l, the second term in the RHS of (28) 



is a (9/^2 (p"/^). We now consider the first term in the RHS of (28). Using Minkowski's inequality 
and Equation ( [27] ) 

n n n 



A;=l 



A:=l 



fc=l 



By [29, Result 178,pp.38], the RHS is upper bounded by C p{l — y/p) ^ with C := limsup^.^^^ llXL.nlh: 
which is finite by Corollary [TJ This concludes the proof. 

E. Proof of Theorem |?] 

Assumption implies that lim„ p"''^7~^ = 0. Therefore, by Theorem pi the sequence of 

-] ley 

r.v. (7„ ^±,n)n converges in probability to zero. Since 0„ = 1 ® (^„) + 0±,n, it remains to 
prove that the sequence of r.v. (7„ ((^n) — ^*))n>o converges in distribution to Z (under the 
conditional distribution given the event {limg6'g = 1® 9^,} which, under Lemma [T] is the same 
as the conditional distribution given the event {limq(6'g) = 6^}). To that goal, we write 

{Gn) -0*= (^n-l) -0* + Inh {{0n-l)) + 7ne„l|0„_i_e^|<5 + 7„^n + 7ne„l|0„_i-0^|>5 

Where ^n := ^e^-A{Y)] - Ei«(,„_,)[(l-)] and 

since l^Wn = 1^- We then check the conditions CI to C4 of [20, Theorem 1] (see also [30, 
Theorem 1]). Under the assumptions |6] and ^), the conditions CI and C4 of [20, Theorem 1] 
are satisfied. We now prove C2b: there exists a constant C such that 

E Oe„+ir+^l|.„-i«..|<5] < C E [|E,„ [{Y)] p+^l|,_i«,,|<,] + C E [|(rn+i)r+^l|.„-i«..|<5] 



< 2C sup Eg 

|0-l(86'*|<5 



r)p+^] 



and the RHS is finite under Assumption |7] For C2c, we have 

E [e„+ie^+i|J-„] l|,„_i«,.|<, = {e,„ [{Y){Yf] - E,„ [(1^)] (E,„ [{Y)]f] l|e„-i«..|<.. 

By Assumptions |4] and |7| this term converges w.p.l to T on the set {limfc Ok = 100^,} and since 
{limfc 0fe = 1 (g) 9^} = {limfc (0fc) = 9^} w.p.l (as a consequence of Lemma [T]), it also converges 
w.p.l to T on the set {hiak{Ok) = 9^,}. This concludes the proof of C2. 
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We now consider the condition C3 of [20] with r„ = C«+Cnl|6»„_i-i®e*|><5- By Assumption 3 1 1, 
Theorem [3] and Lemma [1} there exists a constant C such that 

and the RHS tends to zero as n — )■ 00. On the set {hm„(0„) = 0^} (which, as discussed above, 
is equal w.p.l to the set {lim„0„ = 1® Qi,}), the r.v. e„l|6»„_i-i(g)6i^,|>5 is null for all large n so 
that 7„ X]fc=i <3fcl|0fc_i-i®6i*|>5 is Cu,.p.iOLi(l). This concludes the proof of the condition C3 of 
[20], and the proof of Theorem |4j 

F. Proof of Theorem |5] 

We preface the proof by a preliminary result, established by [20, Theorem 2] (see also [19] 
for a similar result obtained under stronger assumptions). 

Theorem 6: Let (7„)„ be a deterministic positive sequence such that log(7fc/7fc+i) = 0(7^) 
and satisfying Assumption ^ c]). Consider the random sequence {un)n given by 

Un+l =Un + ln+lh{Un) + 7n+ie„+l + 7„+l^„+l , UqEW^ , 

where 
AVER 1: 
(a) Ui, is a zero of the mean field: /i(m^) =0. 

(b) the mean field h -.W^ ^ W^ h twice continuously differentiable (in a neighborhood of m^) 
and Vh{Ui,) is a Hurwitz matrix. 
AVER 2: 
(a) (e„)„>i is a J^^-adapted martingale-increment sequence. 

(b) There exist r > and 5 G (0, +00] s.t. sup^E [|efcp+'^l|„^_^_„^|<5] < cxd. 

(c) There exists a positive definite (random) matrix Ui, such that on the set {[vnvqUq = m^}, 
limfcE [cfcC^lJ-fc-i] = Ui, almost- surely. 

AVER 3: (^„)„>i is a J^„-adapted sequence s.t. 

(a) -fn^^'^ |^n|llim,n,=n. = C^.p.i(1)Cl2 (1) 

(b) n-^/^Y^l^Q^k+ihim,u,=u, converges to zero in probability. 
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Then for any t E 



limE 

n 



liim,«,=e* exp I i^/n ^^ ( - ^ 



Uk-u^ 



fe=i 



E 



@By 






liim,«,=«* exp ( --fVh{u^) ^ f/^ V/i(m^) ^t 



Proof of Theorem^ By Theorem N and Assumption i :l, \/ N X]n=i ^^." converges in L^ to 
zero. Since 0„ = ^±,n+l® (^n), we now prove a CUT for the averaged sequence N^'^ Z]n=i(^n)- 
To that goal, we check the assumptions AVERl to AVERS of Theorem [6] with «„ = {On) and 
eniin defined as in the proof of Theorem |4j Under Assumption [6} AVERl holds. AVER2 is 
proved along the same lines as in the proof of Theorem |4j Finally, by Assumption ^i) and 
Theorem|3| -i-^¥.[\in?'li^m^ek=me,] = C(7n); and 

n=l n=l 

The RHS tends to zero under Assumption]^ thus showing AVERS. 



Appendix 
Lemma 3: Let (7n)n>0j (Pn)n>o be respectively a positive and a [0, l]-valued sequence such 



that J2n 7n < o*^' and m„, t>„ be two real sequences such that ( 16 ) and ( 17 ) hold true for n > uq, 
and Uno + Vno < oo. Then: i) sup„f„ < oo, ii) limsup^^^M^ < oo for any positive sequence 
(0n)n>o such that 



limsup I 7nV0n + 
n V 

^(Pn^ <00 . 



M-1 



< OO , liminf(7„i 



1 / <Pn-l 



- p„ ) > , (29) 
(30) 



Remark 2: If the sequences (7„,p„)n>o are such that 



lini sup 



In , 1 - Pn-l\ ^ r ■ t ^ 

< OO , limmi 



,7n-l 1-Pn 

5Z7n(l-Pn)"'<00 , 



2 ^,2 



(1-Pn-l)^ 7 



1 - Pn V (1 - PnY 7n-l 



Pn > ,(31) 



(32) 



then the conditions (29) and (30) are satisfied with 0„ := (1 — PnY /in- Examples of sequences 
satisfying these conditions are p„ = 1 — a/n^, 7„ = 70 /ri^ with 0<?7<1A(^ — 1/2). 
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Proof: • Set 7„ = (1 + M)7„. Define two sequences (a„,6„)„>„o such that a„j, = 6„(, = 
max(Mn(,, f^g) and for each n > no + 1: 

a-a = Pnttn-l + ^n^O^X (1 + On-l + &n-l)^^^ + 7n(l + «n-l + K-l) (33) 

&n = K-\ + Ma„_i + 7„Va„_i(l + a„_i + 6„_i)^/^ + 7^(1 + a„_i + 6„_i) . (34) 

It is straightforward to show by induction that m„ < a„ and f„ < 6„ for any n > no- In addition, 

hn = hn-i + an + {M - Pn)an-i. Thus for n > no + 1, 

n-l 

hn = an+^{M + 1- pk+i)ak . 
Define A„ := (M + 1) Yll=no '^'=' '^ — "^o- ^^^ above equality implies that an <bn < A^. As a 



consequence, Eq. ([33]) implies: 

On < p„a„_i + 7„ys;:H(l + 2A„_i)^/2 + 7^(l + 2A„_i) . 



As (A„)„>„Q is a positive increasing sequence, for any n > uq + 1, 

1/2 



Ar 



< 



Pn 



a-n-l 
An-1 



+ 7n 



Ctn-1 



An-1 \Ang 



1 



2 1 +7^ 



A, 



+ 2 



no 



(35) 



(36) 



Define L^ := 1/A„q + 2, and c„ := 0„a„/y4„. By (36), for any n > uq + 1 



Cn < Pn- Cn-l + L7„a/c„_i0„ 

\-l V 0n-l 



+ ^^ 7n0n, 



(37) 



and under the assumption (29), there exist rii > uq and a constant ^ > such that for any 

n > 111, 

(38) 



'M-i 



Lni + CL% 



n-l 



, I rn—l 
< I — pn I I7n^ 



Define 



,1 1 
A := max | -,—,Cn^ 



(39) 



We prove by induction on n that Cn < A for any n > ni. The claim holds true for n = ni by 



definition of A. Assume that c„_i < A for some n—1 > rii. Using (37 1 and (39 1, for n>ni + l. 



A (pn-l 



+ ^^7nV0n 

y A V HJn-l 



+ -J ll<Pn, 



By (38), the RHS is less than one so that c„ < A. This proves that (c„)„>„o is a bounded 
sequence. 
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We prove that (v4„)„>„q is a bounded sequence. Using the fact that sup„>„^ p„ < 1, (y4„)„>, 



no 



is increasing and Eq. (35 1, it holds for n > ni + 1 



An = An-i + a„ < An-i + a„_i + 7„^a„„i a/A^L^/^ + 7^L^y4„_i 
Finally, since sup„>„ Cn < A and (1 + t^) < exp(t^), there exists C > s.t. for any n > 



rii + 1, An< exp (C{0^1i + 7^}) An-i (note that under (|29|, limsup„{7„/v/0^}0„ < 00). By 
assumptions, J2n{'t'n-i + In} < ^^' (^n)n>no ^^ therefore bounded. 

• The proof of the lemma is concluded upon noting that Vn < bn < An and Un < an < 7^c„A„. 
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Fig. 1. A'^ = 40 sensors (diamonds) with the graph (hne segments) and the source (star) 
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Fig. 2. Cumulative relative error (over the A'^ sensors) when estimating &,, as a function of the number of iterations. 
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